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ABSTRACT 

We develop a formulation for molecular dynamics, Langevin, and hybrid Monte Carlo 
algorithms in the recently proposed generalized ensemble that is based on a physically 
motivated realisation of Tsallis weights. The effectiveness of the methods are tested with 
an energy function for a protein system. Simulations in this generalized ensemble by the 
three methods are performed for a penta peptide, Met-enkephalin. For each algorithm, it 
is shown that from only one simulation run one can not only find the global-minimum- 
energy conformation but also obtain probability distributions in canonical ensemble at any 
temperature, which allows the calculation of any thermodynamic quantity as a function 
of temperature. 
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1. INTRODUCTION 

For many important physical systems like biological macromolecules it is very difficult 
to obtain the accurate canonical distribution at low temperatures by conventional sim- 
ulation methods. This is because the energy function has many local minima, sepa- 
rated by high energy barriers, and at low temperatures simulations will necessarily get 
trapped in the configurations corresponding to one of these local minima. In order to over- 
come this multiple-minima problem, many methods have been proposed. For instance, 
the generalized-ensemble algorithms, most well-known of which is the multicanonical ap- 
proach, [||, § are powerful ones and were first introduced to the protein-folding problem in 
Ref. ||. Simulations in the multicanonical ensemble perform ID random walk in energy 
space. They can thus avoid getting trapped in states of energy local minima. Besides mul- 
ticanonical algorithms, simulated tempering j|, [| and l/fc-sampling || have been shown 
to be equally effective generalized-ensemble methods in the protein folding problem. 
The simulations are usually performed with Monte Carlo scheme, but recently molecular 
dynamics version of multicanonical algorithm was also developed. ||, § 

The generalized-ensemble approach is based on non-Boltzmann probability weight fac- 
tors, and in the above three methods the determination of the weight factors is non-trivial. 
We have recently shown that a particular choice of the weight factor of Tsallis statisti- 
cal mechanics, [JT0[] which is a nonextensive generalization of Boltzmann-Gibbs statistical 



mechanics, can be used for a generalized-ensemble Monte Carlo simulation. |I I] The ad- 
vantage of this ensemble is that it greatly simplifies the determination of the weight factor. 

The purpose of the present work is to generalize this Monte Carlo approach to other 
simulation techniques. Here, we consider three commonly used algorithms: molecular 



dynamics, (TT2] Langevin,[|13|| and hybrid Monte Carlo, [frjj The performances of the algo- 
rithms are tested with the system of an oligopeptide, Met-enkephalin. 



2. METHODS 

2.1. Monte Carlo in the new ensemble 

In the canonical ensemble at temperature T each state with potential energy E is weighted 



2 



by the Boltzmann factor: 

W B (E,T) = e~ fiE , (1) 

where the inverse temperature f3 is defined by (3 = 1/ksT with Boltzmann constant /eg. 
This weight factor gives the usual bell-shaped canonical probability distribution of energy: 

P B {E,T)<xn{E) W B (E 7 T) , (2) 

where n(E) is the density of states. For systems with many degrees of freedom, it is 
usually very difficult to generate a canonical distribution at low temperatures. This is 
because there are many local minima in the energy function, and simulations will get 
trapped in states of energy local minima. 

Generalized-ensemble algorithms are the methods that overcome this difficulty by 
performing random walks in energy space, allowing simulations to escape from any state 
of energy local minimum. Here, we discuss one of the latest examples of simulation 



techniques in generalized ensemble. [[Til The probability weight factor of this method is 
given by 

/ E - E r *\~ nF 
W(E)= l+fo ^ , (3) 



rip 

where T = l/knPo is a low temperature, np is the number of degrees of freedom, and Eqs 
is the global-minimum potential energy (when Eqs is not known, we use its best estimate). 
Note that this weight is a special case of the weights used in Tsallis generalized statistical 



mechanics, [ 10 1 where the Tsallis parameter q is chosen as 



9 = 1 + —. (4) 

Up 

Note also that through the substraction of Egs it is ensured that the weights will always 
be positive definite. 

The above choice of q was motivated by the following reasoning. |TT| We are interested 
in an ensemble where not only the low-energy region can be sampled efficiently but also 
the high-energy states can be visited with finite probability. In this way the simulation 
can overcome energy barriers and escape from local minima. The probability distribution 
of energy should resemble that of an ideal low-temperature canonical distribution, but 
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with a tail to higher energies. The Tsallis weight at low temperature 

W T s(E) = [1 + (q - 1)(3 (E - E GS )r^ (5) 

has the required properties when the parameter q is carefully chosen. Namely, for suitable 
q > 1 it is a good approximation of the Boltzmann weight Wb{E,T ) = exp(—/3o(E — 
Egs)) f° r (<? ~ 1)A)(-E ~ Eos) <C 1 , while at high energies it is no longer exponentially 
suppressed but only according to a power law with the exponent l/(q — 1). To ensure 
that simulations are able to escape from energy local minima, the weight should start 
deviating from the exponentially damped Boltzmann weight at energies near its mean 
value. This is because at low temperatures there are only small fluctuations of energy 
around its mean (< E >t )- m Eq. (^) we ma y thus set 

(q - 1) fa (< E > To -E GS ) = ~ ■ (6) 

The mean value at low temperatures is given by the harmonic approximation: 

<E> To =E GS + 1 ^-k B T, = E GS + ^ . (7) 

Substituting this value into Eq. (|B|), we obtain the optimal Tsallis parameter in Eq. (^). 

We remark that the calculation of the weight factor is much easier than in other 
generalized-ensemble techniques, since it requires one to find only an estimator for the 
ground-state energy E GS , which can be done, for instance, by a procedure described in 



Ref. [11] 



As in the case of other generalized ensembles, we can use the reweighting techniques 
to construct canonical distributions at various temperatures T. This is because the 
simulation by the present algorithm samples a large range of energies. The thermodynamic 
average of any physical quantity A can be calculated over a wide temperature range by 

/ dx A(x) W-\E(x)) e~ mx) 

<A> T = 1 — T , (8) 

/ dx W-\E{x)) e~ mx) 

where x stands for configurations. 
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In the following subsections, we describe how to implement Langevin, molecular dy- 
namics, and hybrid Monte Carlo algorithms in the new ensemble defined by the weight 
of Eq. (|3]). We remark that Langevin and molecular dynamics algorithms for Tsallis 



statistical mechanics were also developed in Refs. flnj and |l7| , respectively 
2.2. Langevin algorithm 



The Langevin algorithm [13| is used to integrate the following differential equation: 



dE 

Qi = ~ ^~ + Vi = fi + Vi , 



(9) 



where (z = 1, • • • , N) is the (generalized) coordinates of the system, E is the potential 
energy, f\ is the "force" acting on the particle at g,, and rji is a set of independent Gaussian 
distributed random variables with a variance: 



< Vi(ti)Vj(tm) >= 2k B T S ij 5(ti - t m ). 



(10) 



Here (and hereafter), we set all the masses m ; equal to unity for simplicity. It can be 
shown that the dynamics based on the Langevin algorithm yields a canonical distribution 
Pb(E,T ) oc n(E) Wb{E,T ) = n(E)e~ l3 ° E . In order to generalize this technique to 
simulations in the new ensemble, we rewrite the weight factor in Eq. @ as 

E — E GS 



W(E) = exp -fa 



Po 



n F 



(11) 



Defining now an effective potential energy by |TB, [17 



(12) 



we see that Langevin simulations in the new ensemble can be performed by replacing E 
in Eq. © by E eff (E): 

dE eff (E)dE 



dE dq t 
1 



1 + ^(E-E GS ) 

71f 



fi + Vi ■ 



(13) 
(14) 



Note that the procedure that led to the above equations is exactly the same as the one 
we followed when we developed molecular dynamics and related algorithms in another 
generalized ensemble, i.e., multicanonical ensemble. § 



For numerical work one has to integrate the above equation by discretizing the time 
with step At and therefore for actual simulations we use the following difference equation: 

/ \ 



qi(t + At) = fc(t) + At 



1 



1 + ^ (E(t) - Eqs) 

Tip 



fi{t)+Vi(t) 



(15) 



/ 



Using the above equation we will sample in the Langevin simulation the same ensemble 
as in a Monte Carlo simulation with the weight of Eq. (|3]). Hence, we can again use the 
re- weighting techniques and calculate thermodynamic averages according to Eq. (g). 
2.3. Molecular dynamics and hybrid Monte Carlo algorithms 

Once the formulation of Langevin algorithm for the new ensemble is given, the implemen- 
tation of molecular dynamics algorithm is straightforward. 

The classical molecular dynamics algorithm is based on a Hamiltonian 

1 N 



/ 1=1 



(16) 



7T; 



where 7Tj are the conjugate momenta corresponding to the coordinates q^. Hamilton's 
equations of motion are then given by 

dH 

r> r> Ji 1 

oqi dqi 

and they are used to generate representative ensembles of configurations. For numerical 
work the time is discretized with step At and the equations are integrated according to 
the leapfrog (or other time reversible integration) scheme: 

%(t + At) = q t (t) + At it, (t + ^) , 

/ 3 \ / At\ V J ( 18 ) 

7T^t+-AtJ =7T i i [ t+ —J + At fi(t + At) . 

The initial momenta {7Tj(^)} for the iteration are prepared by 



/At 



Tii 



At 



— ) = 7Tj(0) + -7r/i(0) 



(19) 



V 2 J ' 2 

with appropriately chosen qi(0) and 7Tj(0) (7rj(0) is from a Gaussian distribution). 

In order to generalize this widely used technique to simulations in our case, we again 
propose to replace E by E e ff (of Eq. (0)) in Eq. ([171) . A new set of Hamilton's equations 
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of motion are now given by 

rip 

This is the set of equations we adopt for MD simulation in our new ensemble. For nu- 
merical work the time is again discretized with step At and the equations are integrated 
according to the leapfrog scheme. 



The hybrid Monte Carlo algorithmic is based on the combination of molecular dy- 
namics and Metropolis Monte Carlo algorithms . Namely, each proposal for the Monte 
Carlo update is prepared by a short MD run starting from the actual configuration. In 
this sense, the algorithm is based on a global update, while in the conventional Metropo- 
lis method one is usually restricted to a local update. Furthermore, the Metropolis step 
ensures that the sampled configurations are distributed according to the chosen ensem- 
ble, while convential molecular dynamics simulations are hampered by difficult-to-control 
systematic errors due to finite step size in the integration of the equations of motion. 

Given the set of coordinates {q{\ of the previous configuration and choosing the cor- 
responding momenta {tt,i} from a Gaussian distribution, a certain number of MD steps 
are performed to obtain a candidate configuration {g-,^}. This candidate is accepted 
according to the Metropolis Monte Carlo criterion with probability 

p = mm{l,e- f3 ^'^- H ^} , (21) 

where H is the Hamiltonian in Eq. (|16|) . The time reversibility of the leapfrog integration 
scheme ensures detailed balance and therefore convergence to the correct distribution. The 
whole process is repeated for a desired number of times (Monte Carlo steps). The number 
of integration (leapfrog) steps N LF and the size of the time step At are free parameters 
in the hybrid Monte Carlo algorithm, which have to be tuned carefully. A choice of too 
small Nlf and At means that the sampled configurations are too much correlated, while 
too large Nlf (or At) yields high rejection rates. In both cases the algorithm becomes 
inefficient. The generalization of this technique to simulations for our ensemble can again 
be made by replacing the potential energy E by E e ff (of Eq. fll2|) ) in the Hamiltonian of 



Eq. (0). 



3. RESULTS AND DISCUSSION 

The effectiveness of the algorithms presented in the previous section is tested for the 
system of an oligopeptide, Met-enkephalin. This peptide has the amino-acid sequence 
Tyr-Gly-Gly-Phe-Met. The potential energy function that we used is given by the sum 
of electrostatic term, Lennard- Jones term, and hydrogen-bond term for all pairs of atoms 
in the peptide together with the torsion term for all torsion angles. The parameters for 
the energy function were adopted from ECEPP / 2 . ||1 9|| - 1 2l| The computer code SMC [ 2~2| 
was modified to accomodate the algorithms. 

For the generalized coordinates we used the dihedral angles. The peptide-bond 
dihedral angles u were fixed to be 180° for simplicity. This leaves 19 dihedral angles 
as generalized coordinates (uf = 19). The global-minimum potential energy Eqs in 
this case was obtained previously and we have Eqs — —10.7 kcal/mol.||26|| As for the 
temperature, we set T = 50 K (or, j3 = 10.1 [ kcaJ / m ol ]), following the case for the Monte 
Carlo simulation in the present generalized ensemble. Jll] We used these numerical values 
for rip, Eos-, and (3q in Eq. We remark that the convention for energy values in the 
present code, SMC, []S2[ is slightly different from the one in the previous Monte Carlo 



work. [|Tl|] Thus, the above value of Eqs is accordingly different from that in Ref . [TT| . By 
the definition of generalized ensembles, which are based on non-Boltzmann weight factors, 
one cannot obtain information on the real dynamics of the system by the MD algorithm, 
and only static thermodynamic quantities can be calculated. For this reason we do not 



need to consider the equations of motion for dihedral space as presented in Ref. [^j , but 
can use the much simpler form for the kinetic energy term as given in the previous section 
(see Eq. (ID). 

For the MD simulations in our ensemble, we made a single production run with the 
total number of time steps Nlf = 800, 000 x 19 and the time-step size At = 0.0075 (in 
arbitrary units). For Langevin algorithm, a production run with the same number of time 
steps (Nlf = 800, 000 x 19) as in the MD simulation was performed, but our optimal time- 
step size was only At = 0.00028. This indicates that the simulation moves more slowly 



8 



through phase space, and we expect slower convergence to the Tsallis distribution than 
in MD case. For the hybrid Monte Carlo algorithm, an MD simulation with 19 leapfrog 
steps was made for each Monte Carlo step and a production run with 400,000 MC steps 
was made. Since the Metropolis step in hybrid Monte Carlo corrects for errors due to 
the numerical integration of the equation of motion, the time-step size At can be large 
for this algorithm. We chose At = 0.01375 in our units. The initial conformation for all 
three simulations was the final (and therefore equilibrized) conformation obtained from a 
Monte Carlo simulation of 200,000 sweeps, following 1,000 sweeps for thermalization with 
the same weight (in each sweep all of the 19 angles were updated once). 

In Fig. 1 the time series of the total potential energy are shown for the three simulations 
with the new weight. They all display a random walk in energy space as they should for a 
simulation with the weight of Eq. (^). All the lowest-energy conformations obtained were 
essentially the same (with only a small amount of deviations for each dihedral angle) as 
that of the global-minimum energy conformation previously obtained for the same energy 
function (with uj = 180°) by other methods. p4|, |^, ^5] The global-minimum potential 
energy value obtained by minimization is —10.7 kcal/mol. |25|] The random walks of the 
MD and hybrid MC simulations visited the global- minimum region (E < —10 kcal/mol) 
six times and eight times, respectively, while that of the Langevin simulation reached the 
region only twice. These visits are separated by the walks towards the high-energy region 
much above E — 16 kcal/mol, which corresponds to the average energy at T = 1000 
K.|| Hence, the rate of convergence to the generalized ensemble is the same order for all 
three methods (with MD and Langevin algorithms slightly slower). As discussed below, 
however, the results of thermodynamic quantity calculations all agree with each other, 
implying that the methods are equally reliable. 

In Fig. 2 the time series of the overlap O of the conformation with the ground state is 
plotted. Our definition of the overlap, which measures how similar a given conformation 
is to the lowest-energy conformation, is given by 

1 n F 

0{t) = l-—-Y.\^-t S) \ , (22) 
where 9\ and 6] (in degrees) stand for the rip torsion angles of the conformation at 
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t-th simulation step and the lowest-energy conformation, respectively. Symmetries for 
the side-chain angles were taken into account and the difference df — 9^ S ^ was always 
projected into the interval [—180°, 180°]. Our definition guarantees that we have 

< < O > T < 1 , (23) 

with the limiting values 

r < (t) > T - 1 , t - o , 

\ < 0{t) > T -»■ , T -> oo . 1 > 

Only the result from the hybrid Monte Carlo simulation is given in Fig. 2, since the other 
two simulations give similar results. Note that there is a clear anti-correlation between 
the potential energy E and overlap O (compare Figs, lc and 2), indicating that the lower 
the potential energy is, the larger the overlap is (closer to the ground state). 

Simulations in generalized ensemble can not only find the energy global minimum but 
also any thermodynamic quantity as a function of temperature from a single simulation 
run. As an example, we show in Fig. 3 the average potential energy as a function of tem- 
perature calculated from independent runs of the three algorithms together with that from 
Monte Carlo results of Ref. |26| which rely on a multicanonical Monte Carlo simulation. 



The results all agree within error bars. 

Another example of such a calculation is the average overlap as a function of tem- 
perature. The results are essentially the same for the three algorithms. That from the 
MD algorithm is shown in Fig. 4. We see that the average overlap approaches 1 in the 
limit the temperature going to zero, as it should (see Eq. (0)). We remark that the 
average overlap approaches the other limiting value, zero (see Eq. (p4|)), only very slowly 
as the temperature increases. This is because < O >t = corresponds to a completely 
random distribution of dihedral angles which is energetically highly unfavorable because 
of the steric hindrance of both main and side chains. 



CONCLUSIONS 

In this article we have shown that the generalized-ensemble algorithm based on a special 
realisation of Tsallis weights is not restricted to Monte Carlo simulations, but can also be 
used in combination with other simulation methods such as molecular dynamics, Langevin, 
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and hybrid Monte Carlo algorithms. We have tested the performances of the above three 
methods in the generalized ensemble for a simple peptide, Met-enkephalin. The results 
were comparable to those of the original Monte Carlo version [11] in that the rate of 



convergence to the generalized ensemble is of the same order and that the thermodynamic 
quantities calculated as functions of temperature all agree with each other. We believe 
that there is a wide range of applications for the generalized-ensemble versions of molecular 
dynamics and related algorithms. For instance, the generalized-ensemble MD simulations 
may prove to be a valuable tool for refinement of protein structures inferred from X-ray 
and/or NMR experiments. 
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Figure Captions 



FIG. 1. (a) Time series of the total potential energy E (kcal/mol) from a Langevin 
simulation in the new generalized ensemble. The simulation consists of 800, 000 x 19 
time steps with step size At = 0.000028. (b) Time series of E from a molecular 
dynamics simulation in the new generalized ensemble. The simulation consists of 
800, 000 x 19 time steps with step size At = 0.0075. (c) Time series of E from a 
hybrid Monte Carlo simulation in the new generalized ensemble. The simulation 
consists of 400,000 MC steps. For each MC step an MD run of 19 time steps was 
made with step size At = 0.01375. 

FIG. 2. Time series of overlap function O (as defined in the text) from the hybrid 
Monte Carlo simulation in the new generalized ensemble. The simulation consists 
of 400,000 MC steps. For each MC step an MD run of 19 time steps was made with 
step size At = 0.01375. 

Fig. 3. The average potential energy < E >t (kcal/mol) as a function of temper- 
ature T (K) obtained from independent runs of the three algorithms and a multi- 



canonical Monte Carlo simulation of Ref. [26 



FIG. 4: The average overlap function < O >t (defined in the text) as a function 
of temperature T (K) obtained from the molecular dynamics simulation in the new 
generalized ensemble. 
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